The NXT-format Switchboard Corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue
نویسندگان
چکیده
This paper describes a recently completed common resource for the study of spoken discourse, the NXT-format Switchboard Corpus. Switchboard is a long-standing corpus of telephone conversations (Godfrey et al., 1992). We have brought together transcriptions with existing annotations for syntax, disfluency, speech acts, animacy, information status, coreference, and prosody; along with substantial new annotations of focus/contrast, more prosody, syllables and phones. The combined corpus uses the format of the NITE XML Toolkit, which allows these annotations to be browsed and searched as a coherent set (Carletta et al., 2005). The resulting corpus is a rich resource for the investigation of the linguistic features of dialogue and how they interact. As well as describing the corpus itself, we discuss our approach to overcoming issues involved in such a data integration project, relevant to both users of the corpus and others in the language resource community undertaking similar projects.
منابع مشابه
Reverse Engineering of Network Software Binary Codes for Identification of Syntax and Semantics of Protocol Messages
Reverse engineering of network applications especially from the security point of view is of high importance and interest. Many network applications use proprietary protocols which specifications are not publicly available. Reverse engineering of such applications could provide us with vital information to understand their embedded unknown protocols. This could facilitate many tasks including d...
متن کاملUsing the NITE XML Toolkit on the Switchboard Corpus to Study Syntactic Choice: a Case Study
The NITE XML Toolkit (NXT) provides library support for working with multimodal language corpora. We describe our experiences in using it to study discourse effects on syntactic choice using the parsed Switchboard Corpus as a starting point, as a case study for others who may wish to adopt similar techniques using NXT or one of the other libraries that are beginning to emerge. We discuss conver...
متن کاملPerceiving surprise on cue words: prosody and semantics interact on right and really
Cue words in dialogue have different interpretations depending context and prosody. This paper presents a corpus study and perception experiment investigating when prosody causes right and really to be perceived as questioning or expressing surprise. Pitch range is found to be the best cue for surprise. This extends to the question rating for really but not for right. In fact, prosody appears t...
متن کاملADAM: The SI-TAL Corpus of Annotated Dialogues
In this paper we describe the methodological assumptions, general architectural framework and annotation and encoding practices underlying the ADAM Corpus, which has been developed as part of the Italian national project SI-TAL. Each of the 450 dialogues is represented by an orthographic transcription and is annotated at five levels of linguistic information, namely prosody, pos tagging, syntax...
متن کاملDissertation Proposal Dialogue Glue: Cue Words and Prosody
This dissertation is about how semantics, pragmatics and speech prosody conspire to glue a dialogue together. This proposal focuses on the meaning and use of cue words. This set of discourse markers include backchannels like uh-huh and okay , agreements like right , and questioning particles like really . Understanding these sorts of markers is important because they indicate both when things a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Language Resources and Evaluation
دوره 44 شماره
صفحات -
تاریخ انتشار 2010